A Novel Model for Arbitration between Planning and Habitual Control Systems
نویسندگان
چکیده
It is well established that humans decision making and instrumental control uses multiple systems, some which use habitual action selection and some which require deliberate planning. Deliberate planning systems use predictions of action-outcomes using an internal model of the agent’s environment, while habitual action selection systems learn to automate by repeating previously rewarded actions. Habitual control is computationally efficient but may be inflexible in changing environments. Conversely, deliberate planning may be computationally expensive, but flexible in dynamic environments. This paper proposes a general architecture comprising both control paradigms by introducing an arbitrator that controls which subsystem is used at any time. This system is implemented for a target-reaching task with a simulated two-joint robotic arm that comprises a supervised internal model and deep reinforcement learning. Through permutation of target-reaching conditions, we demonstrate that the proposed is capable of rapidly learning kinematics of the system without a priori knowledge, and is robust to (A) changing environmental reward and kinematics, and (B) occluded vision. The arbitrator model is compared to exclusive deliberate planning with the internal model and exclusive habitual control instances of the model. The results show how such a model can harness the benefits of both systems, using fast decisions in reliable circumstances while optimizing performance in changing environments. In addition, the proposed model learns very fast. Finally, the system which includes internal models is able to reach the target under the visual
منابع مشابه
Cost-Benefit Arbitration Between Multiple Reinforcement-Learning Systems.
Human behavior is sometimes determined by habit and other times by goal-directed planning. Modern reinforcement-learning theories formalize this distinction as a competition between a computationally cheap but inaccurate model-free system that gives rise to habits and a computationally expensive but accurate model-based system that implements planning. It is unclear, however, how people choose ...
متن کاملA novel mathematical model for a hybrid flow shop scheduling problem under buffer and resource limitations-A case study
Scheduling problems play a big role in manufacturing and planning the production for increasing the production efficiency and assigning the resources to operations. Furthermore, in many manufacturing systems there is a physical space between stages that called intermediate buffers. In this study, a model is proposed for minimizing the makespan of a hybrid flow shop scheduling problem with inter...
متن کاملA Stochastic Operational Planning Model for Smart Power Systems
Smart Grids are result of utilizing novel technologies such as distributed energy resources, and communication technologies in power system to compensate some of its defects. Various power resources provide some benefits for operation domain however, power system operator should use a powerful methodology to manage them. Renewable resources and load add uncertainty to the problem. So, independe...
متن کاملActions, Action Sequences and Habits: Evidence That Goal-Directed and Habitual Action Control Are Hierarchically Organized
Behavioral evidence suggests that instrumental conditioning is governed by two forms of action control: a goal-directed and a habit learning process. Model-based reinforcement learning (RL) has been argued to underlie the goal-directed process; however, the way in which it interacts with habits and the structure of the habitual process has remained unclear. According to a flat architecture, the...
متن کاملHabitual control of goal selection in humans.
Humans choose actions based on both habit and planning. Habitual control is computationally frugal but adapts slowly to novel circumstances, whereas planning is computationally expensive but can adapt swiftly. Current research emphasizes the competition between habits and plans for behavioral control, yet many complex tasks instead favor their integration. We consider a hierarchical architectur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1712.02441 شماره
صفحات -
تاریخ انتشار 2017